Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storageclusterpeer #2466

Closed

Conversation

rewantsoni
Copy link
Member

@rewantsoni rewantsoni commented Feb 20, 2024

add a new CR and controller to peer two block pools with the same name

  1. used the command: operator-sdk create api --group ocs --version v1 --kind StorageClusterPeer --resource --controller to create the CR and controller
  2. added logic for creating and deletion of storageClusterPeer

Depends on (in Order)

  1. cephblockpool: allow updation of mirroring field from different components #2480
  2. add rpc call for PeerBlockPool #2457
  3. add rpc call for RevokeBlockPoolPeering #2493

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 20, 2024
Copy link
Contributor

openshift-ci bot commented Feb 20, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

openshift-ci bot commented Feb 20, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: rewantsoni
Once this PR has been reviewed and has the lgtm label, please assign nb-ohad for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@rewantsoni rewantsoni force-pushed the storageclusterpeer branch 5 times, most recently from d8b406f to fc34c02 Compare February 29, 2024 10:57
@rewantsoni
Copy link
Member Author

rewantsoni commented Feb 29, 2024

Last 3 commits are up for review

@jarrpa
Copy link
Member

jarrpa commented Mar 2, 2024

Leave a note in the PR description saying which PR this is dependent on.

Copy link
Contributor

@leelavg leelavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reviewing in phases, will do a more thorough one at the earliest/after draft.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file is required.

PROJECT Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be better to add the operator-sdk command that you used for generating the files to this commit msg.

// StorageClusterPeerReconciler reconciles a StorageClusterPeer object
// nolint:revive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls do mention what aspect of the linter we are skipping in a new line.

}

//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclusterpeers,verbs=get;list;watch;create;update;patch;delete
//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclusterpeers/status,verbs=get;update;patch
//+kubebuilder:rbac:groups=ocs.openshift.io,resources=storageclusterpeers/finalizers,verbs=update
//+kubebuilder:rbac:groups=ceph.rook.io,resources=cephblockpools;,verbs=list;create;update
//+kubebuilder:rbac:groups=core,resources=secrets,verbs=get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think get alone works, basically underlying infra lists the existing and keeps on updating the cache so you need list perms as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During testing it, it worked with get as far as I recall, I will check it once again and update it

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested the changes, only using get works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack, maybe only client from runtime manager require both list & get. Pls recheck that there's no existing rbac for list of secrets and resolve the comment.

// Apply status changes to the StorageClassRequest
statusError := r.Client.Status().Update(r.ctx, r.storageClusterPeer)
if statusError != nil {
r.Log.Info("Failed to update StorageClassRequest status.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storageClassRequest, a typo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will fix it


func (r *StorageClusterPeerReconciler) reconcileCephBlockPool() error {

for _, cephBlockPool := range r.cephBlockPoolList.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so mirroring on block pools is all or none?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we decided that this will be the initial design, if required we might add annotations to whitelist/blacklist the blockpools

}

func (r *StorageClusterPeerReconciler) newExternalClient() (*providerClient.OCSProviderClient, error) {
ocsClient, err := providerClient.NewProviderClient(r.ctx, r.storageClusterPeer.Spec.OCSAPIServerURI, time.Second*10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to construct a grpc client on every reconcile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to APIServerURI, which will be different for different clusters

@rewantsoni rewantsoni force-pushed the storageclusterpeer branch 2 times, most recently from d5cac87 to 0847bb6 Compare March 5, 2024 13:40
@rewantsoni
Copy link
Member Author

Testing:

Cluster 1(primary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec .mirroring'; oc get storageclusterpeer
Wed Mar  6 11:38:12 IST 2024
    clusterID: 4293c939-42b2-4565-b22f-81f53df19844
{}
No resources found in openshift-storage namespace.

Cluster 2(secondary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec.mirroring'; oc get cephrbdmirror 
Wed Mar  6 11:40:37 IST 2024
    clusterID: a96546a7-5e86-4d6e-aded-b6664c3d8f3e
{}
No resources found in openshift-storage namespace.

After applying StorageClusterPeer:

Cluster 1(primary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec .mirroring'; oc get storageclusterpeer
Wed Mar  6 11:41:48 IST 2024
    clusterID: 4293c939-42b2-4565-b22f-81f53df19844
enabled: true
mode: image
NAME                        AGE   PHASE
storageclusterpeer-sample   18s   Ready

Cluster 2(secondary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec.mirroring'; oc get cephrbdmirror; oc get secret bootstrap-secret-e01e021da3ee4ed84ed95581cb7ae967
Wed Mar  6 11:43:16 IST 2024
    clusterID: a96546a7-5e86-4d6e-aded-b6664c3d8f3e
enabled: true
mode: image
peers:
  secretNames:
    - bootstrap-secret-e01e021da3ee4ed84ed95581cb7ae967
NAME         PHASE
rbd-mirror   Ready
NAME                                                TYPE     DATA   AGE
bootstrap-secret-e01e021da3ee4ed84ed95581cb7ae967   Opaque   2      99s

After deleting the StorageClusterpeer:

Cluster 1(primary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec .mirroring'; oc get storageclusterpeer
Wed Mar  6 11:44:35 IST 2024
    clusterID: 4293c939-42b2-4565-b22f-81f53df19844
{}
No resources found in openshift-storage namespace.

Cluster 2(secondary):
[rewantsoni ] date; oc get clusterversion -oyaml | grep clusterID; oc get cephblockpool ocs-storagecluster-cephblockpool -oyaml | yq '.spec.mirroring'; oc get cephrbdmirror; oc get secret bootstrap-secret-e01e021da3ee4ed84ed95581cb7ae967
Wed Mar  6 11:45:27 IST 2024
    clusterID: a96546a7-5e86-4d6e-aded-b6664c3d8f3e
peers: {}
No resources found in openshift-storage namespace.
Error from server (NotFound): secrets "bootstrap-secret-e01e021da3ee4ed84ed95581cb7ae967" not found

@rewantsoni rewantsoni marked this pull request as ready for review March 6, 2024 06:17
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 6, 2024
@rewantsoni
Copy link
Member Author

/test ocs-operator-bundle-e2e-aws

labels:
name: storageclusterpeer-sample
spec:
ocsApiServerUri: 10.0.0.0/31659
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intentionally mentioned / inplace of :?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, updated it. thanks for pointing it out

}

// SetupWithManager sets up the controller with the Manager.
func (r *StorageClusterPeerReconciler) SetupWithManager(mgr ctrl.Manager) error {

enqueueStorageClusterPeerRequest := handler.EnqueueRequestsFromMapFunc(
func(context context.Context, obj client.Object) []reconcile.Request {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use ctx inplace of context

func(context context.Context, obj client.Object) []reconcile.Request {
// Get the StorageClusterPeer objects
scpList := &ocsv1.StorageClusterPeerList{}
err := r.Client.List(context, scpList, &client.ListOptions{Namespace: obj.GetNamespace()})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so you'll have multiple scp in the same namespace? more like, if you have two blockpools and first has two peers, second has only one peer, should you enqueue all the peer CRs (all 3) or can enqueue peers corresponding to block pool only?

let me know if my understanding is incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single storageClusterPeer will peer all the block pools created, if we have two storageClusterPeer, it will peer all the block pools present to both peer clusters


//remove finalizer
r.Log.Info("removing finalizer from the StorageClusterPeer resource")
r.storageClusterPeer.SetFinalizers(slices.Filter([]string{}, r.storageClusterPeer.GetFinalizers(), func(s string) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this SetFinalizers defined? why can't we use controllerutil.SetFinalizer and conditionally update the resource?

Copy link
Member Author

@rewantsoni rewantsoni Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetFinalizers is a pre-defined function, it is defined as:

func (meta *ObjectMeta) SetFinalizers(finalizers []string)            { meta.Finalizers = finalizers }

Updated to use helper function

Comment on lines 187 to 188
slices.Contains(r.storageClusterPeer.GetFinalizers(), ocsv1.StorageClusterPeerFinalizer)
if !slices.Contains(r.storageClusterPeer.GetFinalizers(), ocsv1.StorageClusterPeerFinalizer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated it

Comment on lines 186 to 204
// add finalizer
slices.Contains(r.storageClusterPeer.GetFinalizers(), ocsv1.StorageClusterPeerFinalizer)
if !slices.Contains(r.storageClusterPeer.GetFinalizers(), ocsv1.StorageClusterPeerFinalizer) {
r.Log.V(-1).Info("finalizer missing on the StorageClusterPeer resource, adding...")
r.storageClusterPeer.SetFinalizers(append(r.storageClusterPeer.GetFinalizers(), ocsv1.StorageClusterPeerFinalizer))
if err := r.update(r.storageClusterPeer); err != nil {
return reconcile.Result{}, fmt.Errorf("failed to update StorageClusterPeer with finalizer: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use helper from controllerutil.

Comment on lines +229 to +237
_, err := ctrl.CreateOrUpdate(r.ctx, r.Client, &cephBlockPool, func() error {
cephBlockPool.Spec.Mirroring.Enabled = true
cephBlockPool.Spec.Mirroring.Mode = "image"
return nil
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason in enabling mirroring daemon via gRPC but not locally?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mirror daemon is required on the secondary Cluster not on primary cluster


r.storageClusterPeer.Status.Phase = ocsv1.StorageClusterPeerConfiguring

if err := r.reconcileCephBlockPool(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe reconcileOwnBlockPool sounds better as other one is being called reconcilePeerBlockPool 🤔?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about reconcileLocalBlockPool?

return err
}

_, err := ocsClient.PeerBlockPool(r.ctx, generateSecretName(cephBlockPool.Name, clusterID), secret.Data["pool"], secret.Data["token"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in existing rdr how is the secret name generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbnrh @umangachapagain could you help me in answering this question?


func (r *StorageClusterPeerReconciler) reconcilePeerBlockPool(ocsClient *providerClient.OCSProviderClient) error {

clusterID := util.GetClusterID(r.ctx, r.Client, &r.Log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if clusterID is required for each reconcile, why not store it in the struct itself as it's same throughout the lifetime of cluster?

on another note, I did come across this util func, seems a bit misleading, it'd have been better if it returns err on failure not sure why func dependents are ok with empty val.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even I am not sure why we are okay with returning empty val,
@iamniting @malayparida2000 could you elaborate on why we are returning empty val?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This func is called at one place earlier while constructing configmap. I am not able to recall why we were returning an empty value. As you are already checking if the value is empty and returning the reconcile that should be good.

@rewantsoni rewantsoni force-pushed the storageclusterpeer branch 2 times, most recently from 5f71e7e to c08bb37 Compare March 6, 2024 14:02
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 14, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Mar 18, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 3, 2024
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 17, 2024

//remove finalizer
r.Log.Info("removing finalizer from the StorageClusterPeer resource")
controllerutil.RemoveFinalizer(r.storageClusterPeer, ocsv1.StorageClusterPeerFinalizer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect I guess, you are removing finalizer from in-memory object but not on api server, need to update here.

@rewantsoni rewantsoni force-pushed the storageclusterpeer branch 2 times, most recently from 0393c64 to 283faaa Compare April 29, 2024 09:22
@openshift-merge-robot openshift-merge-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Apr 29, 2024
@rewantsoni rewantsoni force-pushed the storageclusterpeer branch from 283faaa to e9b03f3 Compare May 9, 2024 09:18
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2024
@umangachapagain umangachapagain self-assigned this Jun 11, 2024
@rewantsoni rewantsoni marked this pull request as draft June 20, 2024 08:15
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 20, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 24, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rewantsoni
Copy link
Member Author

Closing this in favour of #2677

@rewantsoni rewantsoni closed this Jul 29, 2024
@rewantsoni rewantsoni deleted the storageclusterpeer branch July 30, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants